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We present a new theory of homogeneous volatility (and variance) estimators 
for arbitrary stochastic processes. The main tool of our theory is the parsimo- 
nious encoding of all the information contained in the OHLC prices for a given 
time interval by the joint distributions of the high-minus-open, low-minus-opcn 
and close-minus-open values, whose analytical expression is derived exactly for 
Wiener processes with drift. The efficiency of the new proposed estimators is fa- 
vorably compared with that of the Garman-Klass, Roger-Satchell and maximum 
likelihood estimators. 
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1. INTRODUCTION 

Volatility, defined as the standard deviation of the increments of the log- 
price over a specific time interval, is a universally used risk indicator. While 
the growing availability of high-frequency tick-by-tick price time series has 
permitted the development of new efficient volatility estimators (see, for 
instance, Yang and Zhang (2000), Corsi et al. (2001), Andersen et al. (2003), 
Ai't-Sahalia (2005), Zhang et al. (2005)), most historical time series as well 
as databases of price time series, for the many tens of thousands of assets 
(stocks, commodities, bonds, currencies, derivatives and so on) that exit 
worldwide, only record price in time steps coarse-grained for convenience 
(which is often daily). However, it is common practice that not just one 
(close) price is recorded for a given time step, but four of them, called 
the open-high-low-close (OHLC) of the price for that given interval. It is 
natural to exploit these four recorded values per time step to develop better 
volatility estimators. 

Rather than just using the time series of close-prices, here, we present a 
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comprehensive theory of homogeneous volatility (and variance) estimators 
of arbitrary stochastic processes that fully exploit the OHLC prices. For 
this, we develop the theory of most efficient point-wise homogeneous OHLC 
volatility estimators, valid for any price processes. We introduce the "quasi- 
unbiased estimators", that can address any type of desirable constraints. 
The main tool of our theory is the parsimonious encoding of all the infor- 
mation contained in the OHLC prices for a given time interval in the form of 
general "diagrams" associated with the joint distributions of the high-minus- 
open, low-minus-open and close-minus-open values. The diagrams can be 
tailored to yield the most efficient estimators associated to any statistical 
properties of the underlying log-price stochastic process. Applied to Wiener 
processes for log-prices with drift, we provide explicit analytical expressions 
for the most efficient point-wise volatility and variance estimators, based on 
the analytical expression of the joint distribution of the high-minus-open, 
low-minus-open and close-minus-open values. 

Our work improves on the following papers. Garman and Klass (G&K) 
(1980) introduced a quadratic estimator for the variance of the Wiener pro- 
cess with drift for the log-price, which has rather low variance but which is 
biased from non-zero drifts. Parkinson (1980) proposed a simple quadratic 
variance estimator proportional to (H — L) 2 , which is using only a part of 
the information available from OHLC prices. Rogers and Satchell (R&S) 
(1991,1994) introduced another quadratic estimator for the variance of the 
Wiener process with drift, which is unbiased for all drifts and has a larger 
fixed variance for all drifts equal to the variance of the process. Both G&K 
and R&S estimators are focused on the variance, and do not present estima- 
tors for the volatility, which is of obvious interest for financial applications. 
Furthermore, the variance of their estimators is not provided for non-zero 
drifts. Magdon-Ismail and Atiya (2003) obtain a maximum likelihood (ML) 
volatility estimator based on the joint distribution of the high and low, 
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previously obtained by Domine (1996). Their estimator does not use the 
close price and is thus less efficient than the ML estimator using the full 
information of the OHLC, as shown here. In addition, we will show that 
the ML estimator is not the most efficient. Yang and Zhang (2000) pro- 
duced an unbiased and efficient quadratic variance estimator, taking into 
account the OHLC of log-prices for n > 1 consecutive days. Their main 
novelty is to take into account the possible existence of jumps (or gaps) of 
prices from yesterday's close till today's open prices. Their minimization of 
the variance of their estimators requires the estimation of expectations of a 
quadratic form of the OHLC which they only partly achieve due to the lack 
of knowledge of the full joint distribution, which we offer in the Appendix. 
Chan and Lien (2003) compared the empirical effectiveness of four estima- 
tors, the Parkinson, the G&K and R&S ones, and the naive excursion range 
H — L estimator. In sum, the present paper can be viewed as providing the 
full underpinning theory of all these previous works, since we are able to 
express efficient estimators in the presence of arbitrary constraints from the 
explicit knowledge of the joint distribution of the OHLC log-prices. 

The paper is organized as follows. Section 2 describes the properties of the 
stochastic processes for which our theory of most efficient homogeneous 
estimators is developed. Section 3 derives the general expressions for the 
most efficient volatility and variance OHLC estimators. Section 4 provides 
detailed analytical results on the statistical properties of the most efficient 
homogeneous estimators, for the case of Wiener process with drift. Section 5 
compares the exact analytical results with those obtained using numerical 
simulations of millions of realizations of the Wiener process with drift. Sec- 
tion 6 concludes. The Appendix presents the joint probability density func- 
tion of the high-minus-open, low-minus-open and close-minus-open values 
for the Wiener process with drift. 
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2. VOLATILITY OF STOCHASTIC PROCESSES: MODELS, DEFINITIONS AND 

PROPERTIES 

2.1. Volatility of simple stochastic log-price process 

The goal of this paper is to construct efficient estimators for the volatility of 
log-price processes. First, we specify the general properties of the stochastic 
processes to which our estimators will be applied. 

Let us consider the stochastic process X(t), which is interpreted as the log- 
price of some asset at time t. Its volatility over the time interval of duration 
T is by definition the standard deviation of the increment X(t +T )— X(t ). 
We assume that X(t) has stationary increments. Accordingly, for simplicity 
but without loss of generality, we can take t — and X(0) = and choose 
the time scale such that T = 1. All the rest of the paper is based on the 
following definition of volatility: 

Definition 2.1 The volatility a of the stochastic process X{t) is equal to 
the square-root of the variance of its increment per unit time 

a = 4~D , D = VarLY(l)] . 

The estimators of the volatility a and of the variance D = a 2 will be denoted 
respectively a and D. 

We consider the following class of stochastic processes 
(1) X(t)=aA(t n ), 

where A(t, 7) is an auxiliary stochastic process, whose statistical properties 
are assumed to be known for any given value of the parameter 7. We assume 
additionally that the expectation and the variance of the stochastic process 
A(t, 7) are finite: 

E[A(t, 7)] < 00 , a 2 (t, 7) = VarL4(t, 7)] < 00 . 
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It follows from (TjQ) and from the definition of volatility that the stochastic 

process A(t, 7) has a unity volatility: 00(1,7) = 1. 

Let us introduce the following auxiliary stochastic process 

(2) y («)=*£> = ,B(t, 7 ), • 4<i ' 7) 



o"o(T,7) ' ' o- (T,7) ' 

By construction, the variance of the increments of the "normalized" stochas- 
tic process Y(t) over a time interval of arbitrary duration T coincides with 
the variance of the increments of the original process X(t) over the unit 
time interval of duration T = 1: 

Var[B(T, 7 )] = 1 Var[F(T, 7 )] = a 2 . 

Let us consider particular examples of the stochastic processes X(t) given 
by (CD) and of the corresponding Y(t) defined by (J2]): 

Example 2.1 The simplest and most common log-price process is the 
Wiener process 

(3) X{t) =fd + aW(t) , 

where W{t) is the standard Wiener process, such that E[W(t)] = 0, Var[W(t)] 
t, while \i is the drift parameter. In this case, o"o(T, 7) = \/T, so that the 
auxiliary stochastic process Y(t) (III) takes the form 

Y(t)=X(t)/VT = aB(t, 1 ) , 

where 

(4) B(t,j) =v(t,j) , v(t,j)=jt + W(t). 

Here, we introduced the "normalized" time r and the parameter 7: 

(5) r = 1 7 = ^VT . 

1 a 
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Remark 2.1 In practical applications, the parameter 7, which is is propor- 
tional to the drift of the stochastic process X(t) (jHJ), is generally unknown. 
Our strategy is to proceed in two steps: (i) determine the most efficient 
volatility and variance estimators for a fixed value of 7, say 70; (ii) explore 
in details the efficiency of the estimators for values of 7 that deviate from 
To- 



Example 2.2 Let X(t) be defined at discrete times t — 0, 1, 2, . . . and let 
it satisfy to recurrent relation 

X(n + 1) =X(n)+fx + ae n , X(0) = , n = 0,1,2,... 

where {e n } is a sequence of iid random variables with zero expectation and 
unit variance Var[e n ] = 1. In order to estimate the volatility a from recorded 
values of X(n) over a discrete time interval of duration N, it is convenient 
to introduce the "normalized" discrete-time process 

= = <™>fa7) , 

where 

v(n,j) — jn + u (n) , n—1, 2, 11(0,7) = 0, 
(6) u 1 n 

Remark 2.2 If the random variables {e^} are Gaussian, the stochastic pro- 
cess X{n) can be interpreted as the discrete-time version of the process X(t) 
defined by ([3]). More interesting is the case where {e^} are non-Gaussian ran- 
dom variables, with a fat tail probability density distribution f(x) ~ 
for large with p > 2 ensuring that the variance exists (see McKenzie, 
(2006) for an excellent review of the history of fat tails in financial returns). 
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2.2. OHLC volatility and variance estimators 

Given the observed realization of the stochastic process X(t) within some 
time interval t G (0, T) over m points, (ti,t 2 , . . . , t m -i) £ (0, T"), t m = T, 
the most general expression of the estimator of the volatility of X(t) is the 
function 

0~m = G\X\i X 2 , . . . , X m ) , 

of the recorded values 

X x = X{t x ) , X 2 = X(t 2 ) , . . . , X m = X(T) . 

Of particular interest for its widespread use and parsimonious representation 
of a given realization of the process X(t) over a finite time interval is the case 
m = 3 that corresponds, in particular, to OHLC estimators. The four letters 
OHLC stand respectively for Open, High, Low and Close. In the following, 
we focus on this case due to its special significance, while it is understood 
that one can generalize the theory developed here to higher-order multipoint 
estimators corresponding to any value m > 3. 

Without loss of generality, we pose X(0) = (in practice, the relevant 
quantities are simply decreased by the opening value at time = 0). Then, 
the high, low and close values of a given realization of the stochastic process 
X(t) within the time interval t € (0, T) are 

(7) H = sup X(t) , L = MX(t) , C = X(T) . 
te(o,T) te(Q,T) 



Definition 2.2 Among all three-points volatility and variance estimators, the 
OHLC estimators are defined as functions of only the three measures (high, low 
and close) of the realization of the stochastic process X(t) within the time 
interval t E (0, T) defined by ([7]). Specifically, OHLC volatility and variance 
estimators are functions which can be written as follows: 

(8) & = &(H,L,C), D = D(H,L,C). 
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Such OHLC estimators are well-known to be more efficient than the equidis- 
tant three-points estimators corresponding to tk = kT/3 (k = 1,2,3). 

2.3. Quadratic OHLC variance and volatility estimators 

Almost all known variance OHLC estimators are quadratic forms of H, L 
and C. Let us introduce the vector X T = (H,L,C), where T denotes the 
transpose operation. Let us denote by Q any positive-definite 3x3 matrix. 

Definition 2.3 The variance OHLC estimator D is called quadratic if it can 
be expressed as a quadratic form 

(9) D = — ^ . X T QX = Y T QY , where Y X 



#,7) ^ vo(T,j) ' 

In turn, the volatility OHLC estimator o is called quadratic if it can be repre- 
sented in the form 

1 



(10) o = <JX T QX = yY T QY . 

o-o (T, 7) 

Two well-known OHLC estimators are quadratic, as shown in the two ex- 
amples O and E31 

Example 2.3 Rogers and Satchell (1991) have suggested the following 
quadratic OHLC variance estimator 

(11) D KS = ^[H(H-C) + L(L-C)} . 

We will refer to this estimator as the RSzS variance estimator. The corre- 
sponding expression 

(12) a RS = -L y/HiH -C)+ L(L - C) 



will be called the R&iS volatility estimator. 
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The R&S variance estimator ( TTTT) has the nice property of being unbiased. 
Namely, for the Wiener process defined by ([3]), and for any a and \x (i.e. 
for any values of the parameter 7), the expected value of the R&S variance 
estimator (iTTj) is equal to the variance of the original process over the time 
interval [0,1]: E[L> R s] = o 2 . 

Example 2.4 Another quadratic OHLC variance estimator was suggested 
by Garman and Klass (1980). This GhK variance estimator is defined by 

ki (H - Lf - k 2 {C(H + L) - 2HL) - k 3 C 2 ] , 

where k\ = 0.511, k 2 = 0.019, k 3 = 0.383. The square root of expression 
([TBI will be referred to as the G&iK volatility estimator. 

For the Wiener process the G&K variance estimator is unbiased only 
if the drift is equal to zero. In general, E[_D GK ] 7^ o 2 if /i 7^ (7 7^ 0). This 
bias is a shortcoming of the G&K variance estimator. Its advantage is that, 
for zero drift \x = (7 = 0), its variance is significantly smaller than the 
variance of the R&S variance estimator. 



(13) D GK = - 



2.4. Homogenous variance and volatility estimators 

In order to more clearly understand the key properties of any quadratic es- 
timators, it is instructive to introduce "generalized" R&S and G&K estima- 
tors for the general stochastic process X(t) defined by ([Tj). For definiteness, 
we will focus here on the "generalized" R&S variance estimator obtained by 
replacing T by cr 2 (T, 7) in (fTTl) : 

D RS = [H(H -0 + L(L - C)] • 

Using relations ([21), it can be written in the form 

(14) D RS = a 2 d RS , 4s = H(H - C) + L(L - C) , 
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where g?rs is function of the high, low and close values of the auxiliary 
stochastic process B(t,j) defined by (j2J) within the interval t G (0,T): 

(15) H = sup B(t, 7) , L — inf B(t, 7) , (7 = B(T, 7) • 

te(o,T) *e(o,T) 

Accordingly, the R&S volatility estimator is equal to 

(16) a RS = aS R s , S RS = ^H(H-C) + L(L-C) . 

The R&S variance estimator .Drs given by (|T4j) has the following important 
property: It is equal to the product of the (unknown) variance a 2 of the 
original process X(t) defined by (JTJ and of the random factor d^s- The 
statistical properties of g?rs are expressed via the statistical properties of 
auxiliary process B(t, 7), which are known by definition. Therefore, for a 
given 7, the statistical properties of g?rs do not depend on the variance a 2 of 
the original process X(t) defined in (OQ). Moreover, since the R&S variance 
estimator is unbiased, the expectation of d-^s is equal to unity: E[cIrs] = 1- 
Correspondingly, one can quantitatively characterize the relative error of 
the R&S variance estimator by the variance of the factor d-Rs, 

Var[4s] = E[4 S ] - 1 , 

which does not depend (for a given 7) on the sought variance a 2 . Figu- 
ratively speaking, one can interpret relation < HM as if the sought variance 
cr 2 was known while its random factor d^s was unknown. Thus, the R&S 
variance estimator is all the more efficient, the smaller is the variance of 
factor d-Rs- 

Definition 2.4 If the OHLC variance estimator is represented in the form 

(17) D = o 2 d 
where the factor 

(18) d = d(H, L, C) 
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depends only on the high, low and close values ( TTBT) of the auxiliary stochastic 
process B(t, 7) and does not depend (for any given 7) on the variance a 2 
of the original stochastic process X(t), then we refer to the factor d ({TBI) as 
the canonical variance estimator. Similarly, if the volatility OHLC estimator is 
represented in the following form, analogous to ( 1161) . 

(19) fr = crs, s = s(H,L,C), 

then the factor s is the canonical volatility estimator. 

Obviously, the estimators ffTTj) and (1191 are unbiased, for a given 7 = 70, if 

(20) E[d| 7o ] = 1 , E[s| T o] = 1 . 

Here and below, we use the notations E[. . . |to], Var[. . . |7 ] for conditional 
expectations and variances, under the condition that the parameter 7 is 
equal to 70. 

Remark 2.3 In general, all volatility a and variance D estimators (jHJ) are 
functions of H, L and C. However, it is not true that all of them accept 
canonical estimators s and d ffTBl) . (JT9~j) . depending on the variables H, L, C 
( fl5j) . In the present paper, we explore only homogeneous estimators, defined 
below, which are expressed via canonical estimators. 

Definition 2.5 The OHLC variance estimator is called homogeneous if it can 
be represented in the form 

(21) D(H,L,C) = h 2 (H,L,C)/a 2 (T, 1 ) , 

where h 2 (H, L,C) is a second-order homogeneous function. Analogously, the 
volatility estimator is called homogeneous if it can be represented in the form 

(22) a(H,L,C) = h 1 (H,L,C)/<r (T,y) , 
where hi is a first-order homogeneous function. 
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Theorem 2.1 The homogeneous OHLC variance estimators D(H, L,C) 
(EH) accept the representation form ( flTj) . ( {18}) . 

Proof. It follows from ([7]), (fl5l) and definition ([2]) of the auxiliary stochastic 
process B(t,j), that the following equalities are true 

(23) H = aH , L = ctL , C = aC , a = aa (T, 7) . 

Thus, one can rewrite relation (1211) in the form 

D{H, L, C) = h 2 (aH, al, aC) /a 2 {T, 7) . 

From the homogeneity property of the second order homogeneous function 

h 2 (aH, aL, aC) = a 2 h 2 (H, L, C) , 

we obtain 

D(H,L,C) = o- 2 h 2 (H,L,C) . 

Thus, the homogeneous estimators (121]) are reduced to f|T7|) and ffl8|) . where 
d = h 2 (H,L,C). □ 

Remark 2.4 One can prove in a similar way that homogenous volatility 
estimators ( 1221) are reduced to the form ( Tl9i) . 

Definition 2.6 The variance ( 1211 and volatility ( 1221) estimators are the most 
efficient homogeneous estimators, for a given 70, if the corresponding canonical 
variance and volatility estimators satisfy relations (|20|) . while their variances 

Var[d| 7o ] = E[cP| 7o ] - 1 , Var[s| 7o ] = E[s 2 | 7o ] - 1 , 

are the smallest among the variances of all possible canonical homogeneous 
estimators, which are unbiased at 70, 
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Remark 2.5 All quadratic estimators are homogeneous. This results from 
their definition (jHJ), since the quadratic form X T QX is a second order ho- 
mogeneous function of its argument X. Analogously, the quadratic volatility 



estimator (TTUT) is homogeneous, because JX T QX is a homogeneous func- 
tion of order one. In particular, the quadratic R&S (ITTj) and G&K (lT3"j) 
variance estimators are homogeneous. 

More insight in homogeneous OHLC estimators can be obtained by repre- 
senting (H, L, C) in the following "spherical" (or geographic) coordinates 
which embody parsimoniously the homogeneity property: 

H = R cos 6 cos $ , L = R cos 6 sin $ , C = R sin , 

(24) R = VH 2 + L 2 + C 2 , 

6 = arctan {twtw) ' $ = arctan (§ 

THEOREM 2.2 Any variance estimator of the form 

(25) D = -*- f( e,*), 

where R, and $ are given by pm ) and (p(8, (p) is an arbitrary function, is 
a homogeneous variance estimator. Reciprocally, any homogenous variance 
estimator fl2TT) can be expressed in the form I[2d*\) . Similarly, 

(26) a = —^—i>(e,$), 

o-o {T, 7) 

where ip{6,(f) is arbitrary function, is a homogeneous volatility estimator 
and reciprocally. 



Proof. It follows from fl2^|) that R 2 is a second order homogeneous func- 
tion of its arguments (H,L,C), while and $ are zero order homogeneous 
functions of the same arguments. Accordingly, </?(©, $) is a zero order ho- 
mogenous function of (H,L,C), while R 2 ip(Q, $) is a second order homo- 



geneous function of (H : L, C). Thus, due to theorem 12. 11 the estimator ( 1251) 
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is represented in homogeneous form as 

D = a 2 d(H, Z, C) , where d(H, Z, C) = R 2 (p(G, $) 

(27) 



R = ^H 2 + L 2 + C 2 . 

In turn, it is obvious that any homogeneous estimator ( 12~TT) . is represented 
in the form (1251) . where 

<£>(6, $) = /^(cos B cos $, cos B sin $, sin B) . 

Using a similar derivation, it is easy to prove that a given by (1261) is a 
homogeneous volatility estimator, i.e., 

(28) a = as((H,L,C) , where s = i#(©,$)- □ 

Remark 2.6 The inequalities 

L^C , H^O , Z s$ , 

resulting from the definition of iJ, L, C, impose that R, 6 and $ should 
satisfy 

0^i?<oo, -|^$<0, s($) < B ^ c($) , 
s(0) = arctan(sin0) , c(0) = arctan(cos 0) . 

Definition 2.7 We will refer to the functions <^(0,0) and ip(9,4>), defined 
respectively by (I27j) and ( 1281) , as the diagrams of the homogeneous OHLC 
variance and volatility estimators. 

Example 2.5 From the definitions fill I) and fll3p . the diagrams of the R&S 
and G&K variance estimators are 

V 9 rs(6 i , 0) = cos 2 9 — - sin 2#(cos + sin 0) , 



(29) V 9 gk(6 i , 0) = cos 2 6*(cos0 — sin 

+k 2 cos 2 6 1 sin 20 — - sin 29 (cos + sin 



— fc 3 sin 2 9 
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It is probable that R&S and G&K estimators are not the most efficient 
quadratic estimators for any given value 70 of the parameter 7. It is there- 
fore natural to search for the most efficient quadratic estimators, at a given 
value 70, which might be more efficient than R&S and G&K estimators. 
We will determine below the most efficient homogeneous volatility and vari- 
ance OHLC estimators for any given 70. The following theorem summarizes 
the relations between the most efficient quadratic and the most efficient 
homogeneous OHLC estimators. 

Theorem 2.3 Let Var q [d I70] and Var q [s |to] be the variances of the most 
efficient quadratic canonical OHLC estimators for a given 70. Let Var^fd^o] 
Var>i[s|7o] be the variances of the most efficient homogeneous canonical 
OHLC estimators for the same given 70. Then, the following inequalities 
hold true 

(30) Var q [d\j ] ^ Var h [d\-f ] , Var q [s \j ] ^ Var h [s\-f ] ■ 

In another words, at a given value 70, the most efficient homogeneous OHLC 
estimator is no less efficient than the most efficient quadratic OHLC esti- 
mator. 

Proof. Denoting as fl q the set of quadratic OHLC estimators, and as flh the 
set of homogeneous OHLC estimators, we have ft q C Qh- The inequalities 
( 1301) derive from this inclusion. □ 

3. DIAGRAMS OF MOST EFFICIENT OHLC HOMOGENEOUS ESTIMATORS 

In this section, we derive the expressions for the most efficient (at 7 = 70) 
homogeneous variance and volatility OHLC estimators , whose canonical 
estimators are given by expressions (1271) and (|28|) . To make clear that these 
estimators depend on 70, we will use the following notations for the diagrams 
of the most efficient homogeneous estimators: <p(9, (ft; 70) and ip(9, (ft; 70). 
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We assume the existence of the joint probability density function (pdf) 
Q(h, I, c; 7) of the random variables (H, L, C) given by equalities ( fl5j) . The 
pdf Q(h, I, c; 7) depends on the parameter 7. The pdf Q(h, I, c; 7) is defined 
by 

Q(h, I, c; rfdhdldc = 

(31) 

Pr{H e (h,h + dh),L G (I, I + dl), C e (c,c + dc)} , 

which expresses the probability that (H, L, C) take specific values to within 
infinitesimal intervals. The Appendix gives the explicit expression of the pdf 
Q(h, I, c; 7) for the special case of the Wiener process v(r, 7) defined in PJ. 
Let us consider first the canonical variance estimator 

(32) d = i2V(e,$; 7 „) . 

The diagram of this estimator can be written as 

(33) ^•*T«» = E[ff G( e,*; 7o )| 7o ]' 

where the function G(8, 0; 70) will be defined below. The expectation term 
in the denominator of expression ( 133]) is equal to 

/o r c {<i>) 
d<P / cos0d0G(0,0;7o)<fe(M;7o) , 

where 

roo _ 

(34) g n {9,4>]^)= / p 2+n Q(pcos9 cos 0, pcosO sin 0, psin^; 7 )dp . 

Jo 

We stress the important property that the canonical OHLC variance esti- 
mator given by (1321) with (133]) is unbiased at 7 = 70, since its expectation 
is 

Thus, we look for the function G(0, $; 70) that makes the unbiased canon- 
ical variance estimator (132]) with (133]) the most efficient for a given 70. 
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Theorem 3.1 The diagram of the unbiased most efficient homogeneous 
canonical variance estimator for a given 70 is equal to 

35 <p{6,4> 70) = -7—— r, 

£(70) 70 ) 

where g n (0, 0; 7) defined by expression (34\ ) and 

J-7T/2 ^4(^,0; 7) 

Proof. The variance of the unbiased homogeneous canonical estimator fl32|) 
with fl33l) is equal to 

f0 /■<:(<£) 



(37) Var d| 7 , 



dcp / cos G? (#,0; 7o)S'4(#, 0; 7o) 

-7r/2 ./s(</>) ^ 
,c(0) \ 2 

d0 / cos 0d0 G(0, 0; j )g 2 (0, 0; 70) 

-tt/2 Js(0) / 



We use the Schwarz inequality to determine the minimal value of the vari- 
ance given by ( 1371) of the canonical estimator. Omitting for the sake of 
conciseness the limits in the integrals, we represent the Schwarz inequality 
in the form 

(JJ A{6,<f>)B(6,<f>)d6d<f?) ^ J J A 2 (6, <f))d6d<p JJ B 2 (d,<j))d6d<f) , 
where A(9, 0) and B(6, 0) are arbitrary real-valued functions. Taking here 
A{9, 0) = G(6, 0; 7o)vW#, 0;7o) cos0 , 

B(6,<P)= g 2 (6,<P; 7o ), 



cos 6 



we obtain 



#4ltf,</>;7oJ 



G(6,fc 70)^2(0,0; 7) cosM?d0) 2 ^ 



JJ C7 2 (0,0; 70)^4^, 0;7o) cos^0 JJ 92 ^ lo) cos 6 dOdcj) . 



^4(^^570) 
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It follows from ( 1371) and from the last inequality that the variance of any 
canonical variance estimator satisfies the inequality 

1 



(38) Var [d(B, $; 7 o)|7oJ > V(<y ) , V(j) - ,, , } ■ 

where £(7) is defined by expression ( 1361) . Taking into account ( 1361) . ( |3Tj) and 
(I38I) . the variance of the canonical variance estimator reaches its minimal 
value V^o) for the following choice of the function G(9, 0; 70): 

9i\p, 0; To) 

This corresponds to the diagram (p(6, <p; 70) given by expression (I35I) . □ 
An analogous derivation provides the unbiased most efficient canonical volati- 
lity estimator, for a given 70. The main corresponding results are summa- 
rized in the following theorem. 

Theorem 3.2 The diagram ip(8,<j); 7 o) of the unbiased most efficient ho- 
mogeneous canonical OHLC volatility estimator, defined by 

(39) s = i#(e,$; 7o ) , 
is equal to 

Hi) = ^ cos 6d6 . 

J-tt/2 Js{<j>) g 2 {6,(j);V 

The variance of the most efficient canonical OHLC volatility estimator is 
equal to 

(41) wbo)= jk)-^ 

Definition 3.1 V( 7 ) defined in ([38]) is called the lowest bound of the vari- 
ance of the canonical variance estimator, for a given value of the parameter 7. 
Analogously, 1^(7) given by (I4TT) is called the lowest bound of the variance of 
the canonical volatility estimator, for the given value of the parameter 7. 
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4. PROPERTIES OF MOST EFFICIENT OHLC VARIANCE ESTIMATORS FOR 

THE WIENER PROCESS 

The Appendix derives the explicit expression of the pdf Q(h, I, c; 7) of the 
high, low and close values of the Wiener process v(t, 7) defined in (T4J). This 
section uses this explicit knowledge to explore the quantitative properties of 
the most efficient canonical estimators for this particular case and compare 
them with those of the R&S and G&K canonical variance estimators. 

0.3 

0.25 
S 0.2 
0.15 



"''O 1 2 3 4 5 

Y 

Fig. 1: Dependence of the lowest bound V(7) given by (|38p of the 
variance of the homogeneous canonical variance estimator as a function 
of 7. V(0) = 0.2583. 

4.1. Variance of canonical variance estimators 

Let us first consider the lowest bound V(7) given by (1381 of the variance 
of the homogeneous canonical variance estimator. For the Wiener process 
model, it is easy to calculate numerically the function ^(7), which is rep- 
resented in figure 1. The variance of the most efficient canonical variance 
estimator at 70 = is equal to V(0) ~ 0.258, which can be compared with 
the corresponding variances for the G&K and R&S canonical variance esti- 
mators: Var[d GK |0] » 0.27, Var[rf RS |0] « 0.331 (Rogers and Satchell, 1991). 
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Thus, at 7 = 0, the G&K variance estimator has almost the same efficiency 
as the most efficient (for 70 = 0) homogeneous variance estimator, while 
the efficiency of the R&S estimator is significantly worse. These results are 
reflecting the closeness of the diagrams of the G&K and most efficient es- 
timators, while the diagram of the R&S estimator drastically differs from 
the diagram of most efficient estimator, as shown in figure 2. 



Most Efficient 

Garman & Klass 




-1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.4 -0.2 




Most Efficient 

Rogers & Satchell 




-1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.4 -0.2 




Fig. 2: Diagrams of the R&S, G&K and most efficient (for 70 = 0) 
variance estimators. See definition 12 . 71 for the meaning and construction 
of the diagrams. 
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4.2. Bias and efficiency of the most efficient (jq) variance estimator 



The homogeneous variance estimator with diagram fl35|) is unbiased and 
most efficient only for a given value 70. In general, the value of 7 is unknown. 
It is thus necessary to quantify the bias and efficiency of the homogeneous 
estimator for different values 7 7^ 70, and compare it with the biases and 
efficiencies of the G&K and R&S variance estimators. 
For this, we first determine the expected value and the variance of an ar- 
bitrary canonical variance estimator given by (132]) . Calculations similar to 
those performed in the previous section yield 

E [d(6,$)| 7 ] = £1(7) , Var [rf(6,$)| 7 ] = £2(7) - £1(7) , 
( 42 ) r o r c{4>) 

JCM = # / cos 9d9 WMjtKCM) • 



'-7T/2 ^«>) 

Substituting the expression (135]) for the diagram of the most efficient esti- 
mator into equation (|4"2l) yields 

E[rf(e,$; 7 o)| 7 ] =^jp 

J-n/2 Js{4>) g±{9, 0; 70) 

Figure 3 presents the dependence as a function of 7 of the expected value 
of the most efficient canonical variance estimators given by (|32l) with (|35|) . 
The expectations of the R&S and G&K canonical variance estimators, whose 
diagrams are given by (129]) . are also shown for comparison. While the R&S 
variance estimator is unbiased for all 7's, the most efficient estimators at 70 
are unbiased only in the neighborhood of 7 = and of 7 = 70. Comparing 
the G&K and the most efficient estimators, the homogeneous estimator, 
which is the most efficient for 70 = 1 for instance, is not significantly biased 
over the whole range ^ 7 < 1.5 and remains much less biased than the 
G&K estimator over the range ^ 7 ^ 2. 
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23 
24 
25 
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1.6r 




1 .5 - 




1,4 


F 


1.3 






H 


1.2 




1.1 - 




1 - 




-2 











7o 


= // / 




7o = 


0.5 ~~~^/"~"~V/ / 




7o 











-1.5 



-0.5 





7 



0.5 



1.5 



Fig. 3: Dependence as a function of 7 of the expected values of the R&S 
(dash-dot line) and G&K (dashed line) canonical variance estimators 
and of the most efficient variance estimators for 70 = 0;0.5;1 (solid 
lines, top-down) 



0.7 

0.6 

^ 0.5 

I 0.4 
0.3 
0.2 





v 7o = 


() ~-~-\ // / 




\ 7o = 






\ 7o = 













-1.5 



-0.5 





7 



0.5 



1.5 



Fig. 4: Dependence as a function of 7 of the variances of the R&S (dash- 
dot line), G&K (dashed line) and most efficient variance estimators for 
70 = 0;0.5;1 (solid lines). The heavy solid line is the lowest bound 
variance given by ([38]) . 
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0.42 r 




0.4 




0.38 




0.36 


F 




3 


0.34 


o 






0.32 






i 


0.3; 




0.28- 




0.26 : 




0.24 




-2 




-1.5 



-0.5 



0.5 



1.5 



Fig. 5: Dependence as a function of 7 of the variances of the renormal- 
ized R&S (dash-dot line) and G&K (dashed line) canonical variance 
estimators, and the most efficient estimators (solid lines), as defined by 
expression (l43l) . 



Calculation of the variance (for any 7) of the canonical variance estimator, 
which is most efficient at 70, gives 

■M(7,7o) -£ 2 (7,7o) 



Var 



•M(7,7o) 



d(e,$; 7 o)l7 

n/2 



c(<f>) 



cos 6d0 



5 , 4(6',0;7)fi , f(6 l ,0;7o) 



is{4>) #1(0,0; 70) 

Figure 4 shows the dependence as a function of 7 of the variances of the 
R&S and G&K canonical variance estimators and of the most efficient ho- 
mogeneous variance estimators for different 70. One can observe that the 
homogeneous variance estimator, which is the most efficient at 70 = 1, is 
both less biased and significantly more efficient than the G&K estimator 
over the interval < 7 < 2. 

One should not be surprised to observe in figure 4 several intervals along the 
7 axis in which the variances of the estimators are smaller than the lower 
bound ^(7) given by (|38l) . Indeed, the lower bound for the variance given 
by (|38l) is suitable only for unbiased estimators. Therefore, the "strange" 
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behavior of the variance plots does not mean that these estimators are more 
efficient than the most efficient estimator at the given 7, but rather that 
they are biased at this point. With the proper renormalization 



(43) d T 



d E 



d| 7 



one can see that, for any 7 values, the estimators have variances which are 
indeed bounded from below by the lower bound V(7), as shown in figure 5. 



4.3. Probabilistic properties of homogeneous estimators 

Knowing the exact explicit expression of the pdf Q(h, I, c; 7) of the high, 
low and close values of the Wiener process v(t, 7) defined in (J3J) given in 
the Appendix, we can go beyond the calculations of the expectations and 
variances of the estimators described in previous subsections and derive 
their full distribution. In particular, the knowledge of the full distribution 
of the estimators allows one to determine the confidence intervals of the 
quasi-unbiased estimators introduced in section H~4l 

Let us consider the pdf of the canonical variance estimator (1271) . For a given 
7, it is defined by the following expression 

/(« ;7 ) = E[ < y( u -^V(e,$))|7 

Using the standard properties of the delta- function of a composite argument, 
we can rewrite the previous definition (14.31) in the form 



/(«; 7 ) = E 
or more explicitly 



6 R 



u 



7 



(44) 



/(«5 7) = ^ f d4>[ 

2 J-tt/2 Js 



r<4>) cos Ode 



X 



Q 



u 



¥>(M) 



cos 9 cos ( 



u 



cos 9 sin 0, 



sin^; 7 
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We use expression ( PHI) to obtain, by numerical integration, the pdf's of 
R&S, G&K and of the most efficient (70 = 0) canonical variance estimators, 
calculated for 7 = 0. These three pdf's are represented in figure 6. 



0.8- 



o 0.6 // 

II 

r- 

it 0.4 



0.2 



0^ 1 1 1 1 1 — ==* 

0.5 1 1.5 2 2.5 3 

u 

Fig. 6: Pdfs of the R&S (dash-dot line), G&K (dashed line) and of the 
most efficient (70 = 0) canonical variance estimators (solid line), at 
7 = 0. 
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Similarly, the pdf of the canonical volatility estimator is defined by 

p(u; 7 ) = EP(«-ity(e,$))|7] , 
and its explicit expression, analogous to fj44p formula, reads 

p(«;t) = 

' } ' r° , /" C (<M cos 6d6 - ( u cos 6 cos u cos sin u sin 6 1 

« / #/ ,*,n ,s Q\ TFn T\ ' 77n 7s , ,,n In ? 7 



Figure 7 shows the pdf 's given by (1451) of the R&S, G&K and of the most 
efficient (70 = 0) canonical volatility estimators, for 7 = 0. 

4.4. Quasi-unbiased quasi- optimal estimators 

The previous subsections have made it clear that the most efficient unbiased 
(70) estimators are not the most efficient for 7 7^ 70, nor are they unbiased. 
Since varying 70 corresponds to scanning these most efficient estimators, 
which remain efficient in a neighborhood of their 70, this suggests to in- 
troduce new reasonably efficient and approximately unbiased estimators, 
obtained as linear superpositions of the most efficient canonical homoge- 
neous variance estimators: 



00 



(46) d( e,*) = ff/"^)M||M d7o . 

J-™ £(70) # 4 (6,$;7o) 
Here, hi^jo) is some weighting function, whose explicit expression must be 
determined from some optimization criterion. A possible requirement is that 
/i(7o) be such as to both minimize the bias of the estimator f|4*6|) and maxi- 
mize its efficiency within some given 7 interval, according to some criterion. 
To demonstrate the principle of this approach, we search for the function 
^o(7o) that ensures that the estimator (146!) is unbiased. The corresponding 
condition is that the expected value of the composed estimator (f46|) given 
by 
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be equal to 1. Condition E d(Q, $)|7 =1 then provides an integral equa- 
tion for the function /io(7o)- I n practice, it is more convenient to look for 
quasi-unbiased estimators, which are exactly unbiased at 2K + 1 values of 
the parameter 7, for instance at 

(47) 7i = * J , i = -K,-K + l,---- 1,0,1,..., K-1,K. 

Given these 2K + 2 constraints, it is natural to search for a solution con- 
structed as the sum of 2K + 1 most efficient (74) canonical variance estima- 
tors: 

(48) d(e, 9) = r 2 E ^(e, $) , <^(0, 0) = ^ ^Hr^T • 

The 2if + 1 unknown coefficients {hi, % = —K, +K} are to be determined 
from the 2K + 2 constraints of an absence of bias at the discrete 7 values 
(1471) . We refer to T as the band width of the quasi- unbiased estimator (1481) . 
while K is its order. 

In particular, the quasi-unbiased estimator of zero order corresponds to the 
previously studied most efficient (70 = 0) canonical variance estimator. The 
first order quasi-unbiased estimator is equal to 

(49) d(e, $) = r 2 [h-Kp-^e, $) + Wo(e, $) + h lVl (e, $)] , 

and so on. 

The expected value of the quasi-unbiased estimator f|48|) is equal to 

(50) E[rf(6,$)| 7 ]= E ^^fev- 

Equating expression (1501) to 1 for the 2K + 1 values (1471) yields the set of 
2.K" + 1 linear equations: 

(51) E [d(6, $) | Ti ] = 1 E £ « ^ = 1 > £ « = ^r-r • 

i=-K 
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-2 -1.5 -1 -0.5 0.5 1 1.5 2 

7 



Fig. 8: Dependence as a function of 7 of the expected values of the 
R&S (dash-dot line), G&K (dashed line) and of the quasi-unbiased 
first-order variance estimators for the band widths T = 0.5; 1 (solid 
lines) . 
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Fig. 9: Dependence as a function of 7 of the variances of the R&S 
(dash-dot line), G&K (dashed line) and of the quasi- unbiased first- 
order variance estimators for the band widths T = 0.5; 1 (solid lines). 



The statistical symmetry of the Wiener process (jl]) implies that the solution 
of equations ( |5TT) satisfies the following symmetry conditions: e^j = £-i-j 



ectaart.cls ver. 2006/04/11 file: 0HLC_EC0N8.tex date: August 12, 2009 



MOST EFFICIENT HOMOGENEOUS VOLATILITY ESTIMATORS 29 
=> hi = h-i. 

Exploiting this symmetry for the first order case K = 1 yields to two equa- 
tions for hi = h_i and h : 

h £o,i + &i(e-i,i + £ i,i) = 1 ' ^o,o + 2/ii£i >0 = 1 , 

whose solution reads 
, 2e 1]0 — £-1,1 — £i,i , £o,o — £ o,i 

llfl — r , — 7 \ • 

2£o,i £ i,o ~ £o,o( £ -i,i + £ i,i) £ o,o( £ -i,i + £ i,i) — 2£o,l£l,o 

Figure 8 shows the dependence as a function of 7 of the expected val- 
ues of the first-order quasi-unbiased canonical variance estimators for band 
widths T = 0.5; 1. Figure 9 presents the dependence as a function of 7 of the 
variances of these estimators. For comparison, the expected values and vari- 
ances of the R&S and G&K estimators are also shown. We can state that 
the quasi-unbiased canonical variance estimators constructed here provide 
the best of both world: (i) they exhibit a very weak bias up to rather large 
values of 7, thus competing reasonably well with the R&S estimator; (ii) 
their variance is very weakly dependent on 7 and significantly smaller than 
that of the R&S estimator for all 7's and than that of the G&K estimator, 
except for a central zone around 7 = 0. 

5. TESTS OF THEORETICAL RESULTS OF VARIANCE AND VOLATILITY 
ESTIMATORS USING SYNTHETIC TIME SERIES OF THE WIENER PROCESS 

The present section implements the variance and volatility estimators dis- 
cussed above for synthetic time series of the Wiener process (j3J). Because our 
results are mathematically exact, these tests on synthetic time series offer 
the opportunity to study the impact of finite size and discreteness effects, 
and give the opportunity to study additional properties of the estimators. 
We will also determine the Maximum Likelihood estimator for the variance 
and the volatility and will compare them to the other estimators. The ho- 
mogeneity of the estimators under study allow us to restrict a to the value 
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1 and to construct time series on the unit time interval (T = 1), without 
losing generality. With these parameter values, we have \l — 7, and X(t) 
are replaced by v(r, 7) given by (jlj). 



5.1. Test on numerical convergence of the discrete to the continuous 

Wiener process 

It is interesting to illustrate and test the theoretical results of previous 
sections by numerical simulation of the Wiener process X(t) given by (j3J). 
Numerical simulations require replacing the continuous time stochastic pro- 
cess v(r, 7) given by (TjJ by its discrete counterpart v(n, 7) given by (J6]), 
where {e^} are Gaussian. 

The Gaussian discrete process v(n, 7) represents rather accurately the con- 
tinuous time process v(t, 7) only for sufficiently large N. On the other hand, 
the discrete process ([6]) obtained for not too large N might describe the 
stochastic behavior of some financial markets more adequately than the con- 
tinuous time process v(t, 7). From a practitioner point of view, N could be 
interpreted as the typical number of transactions within the time interval of 
interest. From a theoretical point of view, N should be chosen large enough 
to simulate the variables H, L and C defined by ( TT5|) . which are known to be 
distributed according to the analytically derived pdf (1A.16I) with ( IA.17I) . To 
determine the appropriate value for N, we repeated M = 10 6 simulations of 
the discrete process v (n, 7) (jHJ), and calculated for each of these M samples 
the corresponding G&K and R&S variance estimators at 7 = 0. Averaging 
over the M realizations, we found the dependence of the expected value of 
the G&K and R&S variance estimators as a function of N, which is shown 
in figure 10. In particular, the statistical average value of the canonical R&S 



estimator, for = 10 6 , is found to be E 



close enough to the theoretical one (E 



d 



RS 



7 = 0] = 0.9987, which is 



7 = 0] = 1) 
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N 

Fig. 10: Dependence as a function of iV of the statistical average value of 
the G&K and R&S variance estimators for 7 = 0, where the statistical 
average is performed over M = 10 6 realizations of the discrete time 
Wiener process v(t, 7) given by ([4]). Note that the two curves are almost 
undistinguishable, but not exactly the same. 



5.2. Variance estimators 



Figures 11 and 12 show the expected values and variances of the G&K, R&S 
and of the most efficient variance estimators, obtained theoretically and by 
numerical simulations with M = 10 5 realizations of v(n, 7), each of length 
N = 10 6 . One can observe an excellent agreement between the simulations 
and the theory. 
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Fig. 11: Dependence of the expected values of the R&S (squares and 
dashed line), G&K (triangles and dash-dot line) and of the most effi- 
cient (at 70 = 0;0.5;1) (circles and solid lines) variance estimators as 
a function of 7. The markers show the values obtained by numerical 
simulations described in the text; the continuous lines correspond to 
the theoretical results presented in sections 3 and 4. 
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Fig. 12: Dependence of the variances of the R&S (squares and dashed 
line), G&K (triangles and dash-dot line) and of the most efficient es- 
timators (at 70 = 0; 0.5; 1) (circles and solid lines) of the variance es- 
timators as a function of 7. The markers show the values obtained 
by numerical simulations described in the text; the continuous lines 
correspond to the theoretical results presented in sections 3 and 4. 
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5.3. Volatility estimators 

We now compare the efficiency and bias of the R&S, G&K and of the 
most efficient canonical volatility estimators. Recall that, while the R&S 
canonical variance estimator (JT3J) is unbiased for all 7's, the R&S canonical 
volatility estimator ( [16]) is biased for all 7's. The same holds true for the 
G&K volatility estimator, which is biased even for 7 = 0. Figure 13 shows 
the dependence of the expected values of these estimators as a function of 
7. In particular, the G&K and R&S volatility estimators have the following 
biases at 7 = 0: 

1 - E[s GK |7 = 0] = 0.0309 , 1 - E[s RS |7 = 0] = 0.0386 . 



1.25 

1.2 
1.15 

1.1 

a 

1.05 
1 

0.95 

0.5 1 1.5 2 

7 

Fig. 13: Dependence as a function of 7 of the expected values of R&S 
(dash-dot), G&K (dashed) and of the most efficient at 70 = volatility 
estimator (solid line). 

In order to provide an appropriate comparison between the efficiency of 
the R&S, G&K and of the most efficient volatility estimators, we normalize 
them by their values reached at 7 = 0: 

(52) s norm (e,$) =i#(e,$)/E[S|7 = 0] . 
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Fig. 14: Expected values of the normalized (according to (l52j) ) R&S, 
G&K and of the most efficient unbiased homogeneous canonical volatil- 
ity estimators at 70 = 0; 0.5; 1. 



0.12 




Fig. 15: Variances of the normalized (according to (I52p ) R&S, G&K 
and of the most efficient unbiased homogeneous canonical volatility 
estimators at 70 = 0;0.5;1. The heavy solid line corresponds to the 
lowest bound variance W(7) given by (f4TT) . 



1 

2 
3 
4 
5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 



ectaart.cls ver. 2006/04/11 file: 0HLC_EC0N8.tex date: August 12, 2009 



(53) 



MOST EFFICIENT HOMOGENEOUS VOLATILITY ESTIMATORS 35 

Figures 14 and 15 show the expected values and variances of the normalized 
(according to ( 1521) ) R&S, G&K and of the most efficient unbiased homo- 
geneous canonical volatility estimators at 70 = 0; 0.5; 1. In particular, the 
variances at 7 = of the normalized G&K, R&S and of the most efficient 
(at 70 = 0) volatility estimators are equal to 

Var[s GK |7 = 0] = 0.06379 , Var[s RS |7 = 0] = 0.08186 , 
Var[s(7o = 0)|-y = 0] = 0.06201 . 

The theoretical results shown in figures 14 and 15 are also compared with 
the numerical calculations performed using M = 10 5 different realizations 
of the discrete Wiener process ([6]) with length N = 10 6 .. 

5.4. Maximum likelihood estimators 

The Appendix derives the exact expression for the joint distribution of the 
H, L, C of a Wiener process. Being a function of the volatility cr, this joint 
distribution allows us to obtain the maximum likelihood (ML) estimator of 
a, as we now describe. It turns out that the MLE is less efficient than the 
most efficient homogenous estimators described above. 
Let us start from Q(h, I, c; 7) given by (1A.16I) in the Appendix, which is the 
pdf of the high, low and close values (H, L, C) defined by (TT5T) of the Wiener 
process v(r, 7) (jl]) with unit volatility. Knowing Q(h, I, c; 7), one can recover 
the pdf Q(r], A, £; fi, cr) of the high, low and close values (H,L,C) defined 
by (JjO) of the original Wiener process X(t) ([3]) for t G (0,T), by using the 
relation 

G(„, A, ft „, „) = -3^=2 ^fAVT) = 

(54) 

1 :exp (_(iz^)!WM 



A 









This expression for the pdf of (H, L, C) allows us to construct the maximum 
likelihood OHLC estimators /2ml and c»ml of the drift and the volatility of 
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the Wiener process X(t) defined by (131). The MLE are obtained by replacing 
the arguments (r],X, £) in ( 1541) by the realized samples (H,L,C), and by 
searching for the values /2ml and <5"ml that maximize the likelihood function 



£{H, L, C- /i, a) = In Q(H, L, C; //, a) 
(C - AmlT) 2 , , „, / if 



+ lnft 





C 







- 31na ML • 



We obtain the ML drift estimator, 
C 

(55) /X M L = 7p • 

We recall that this drift estimator (1551) has the minimal possible variance 
among all estimators, since it realizes the lower bound given by the Cramer- 
Rao inequality. 

The ML volatility estimator 6"ml maximizes the function 
H 



(56) In ft 



The following theorem then derives. 



L 


C 







- 31lKX M L • 



Theorem 5.1 The ML volatility estimator a ml is homogeneous, i.e., anal- 
ogously to . it can written in the form 

0~ML = 0-S M l(H, L, C) , 

where $ML{h, I, c) is a first order homogeneous function. 

Proof. Replacing <jml by oml = psml in expression fl56l) . using the equali- 
ties 

H=», I * . C ° 



ayT oyT o\JT 

and omitting the nonessential constant 3 In a, we obtain that %l should 

maximize the function 

H 



(57) Af(H,Z,C,s ML ) = \n'Jl 



L 


C 




%L 



31ns M L • 
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Here, lZ(h, l\c) is a deterministic function given by ( 1A.17I) . Accordingly, the 
value sml, which maximizes the function J\f(H, L, C, <§ml), is a deterministic 
function s ML = sml(H, C) °f the variables (H, L, C). Its homogeneity is 
obvious. □ 

Remark 5.1 From general properties of maximum likelihood estimators, 
the ML variance estimator is also homogeneous and it is equal to the square 
of the volatility estimator: 

(58) D = a 2 d ML (H, L, C) , d ML (H, L, C) = s 2 Mh {H, L, C) . 

In general, ML estimators are biased. It is therefore convenient to normalize 
it by its value as some given 7 = 70 to obtain 

Sm E[s ML (H,L,C)\ l0 ] ■ 

Since ML estimators are homogeneous, they may not be more efficient than 
the most efficient estimators at the same 70 value. In practice, unbiased 
ML estimators are significantly less efficient than the most efficient one. Let 
illustrate this fact using the normalized ML volatility estimator at 70 = 0. 
For this case, the numerical calculation with (N = 10 6 , M = 10 6 ) of the 
expected value and variance, at 7 = 0, of the canonical ML estimator yields 

(59) 

E[s ML |0] w 0.9202 , Var[s ML |0] « 0.0712 Var[s norm |0] w 0.0840 . 

Comparing these values with those reported in (1531) . one can see that the 
efficiency of the ML volatility estimator is significantly worse than for the 
most efficient one, and even worse than that of the R&S volatility estimator. 
The corresponding values for the ML canonical variance estimator are 

(60) 
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E[(?ml|0] « 0.9179 , Var[d M L|0] « 0.2756 Var[d norm |0] « 0.3271 



While smaller than the variance of the R&S canonical variance estimator 
(Var R g[<i|0] rs 0.331), the variance Var[d nO rm|0] is 27% larger than the vari- 
ance of the most efficient one (V(0) ~ 0.258). 

6. CONCLUSIONS 

We have laid the first stones for a comprehensive theory of homogeneous 
volatility (and variance) estimators of arbitrary stochastic processes. Our fo- 
cus has been to exploit the universally quoted OHLC (open-high-low-close) 
prices, which can span time intervals extending from seconds to years, in or- 
der to develop new efficient estimators. Our theory opens many possibilities 
to design new efficient estimators, such as the "quasi-unbiased estimators" , 
that address any type of desirable constraints. The main tool of our theory 
is the parsimonious encoding of all the information contained in the OHLC 
in the form of general "diagrams" associated with the joint distributions 
of the high minus open, low minus open and close minus open values. The 
diagrams can be tailored to yield the most efficient estimators associated to 
any statistical properties of the underlying log-price stochastic process. 
Our theory opens several interesting developments. First, the accurate de- 
termination of the key functions g n (#, 0; 7), defining the above diagrams, 
gives the tools to develop efficient estimators of the variance and volatility 
(as well as any other quantities of interest) for arbitrary non-Gaussian log- 
price processes, including the presence of micro-structure as in tick-by-tick 
price series. Our methods should lead to the development of fast and effec- 
tive algorithms for low- and high-frequency OHLC variance and volatility 
estimators, that can be applied in practice to any kind of financial markets. 
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In the main text, we lay out the basic stones for a comprehensive theory 
of homogenous OHLC volatility and variance estimators, which are most 
efficient for any specific value of the normalized drift parameter 7 of the 
underlying price stochastic process. This theory uses the OHLC (open-high- 
low-close) prices in the given time interval or scale of interest. 
All expressions depend on a fundamental quantity, which is the joint prob- 
ability density function (pdf) Q(h, I, c; 7) defined by (l3~Tj) of the high, low 
and close values given by ( fT5l) of the auxiliary stochastic process B(t, 7) (J2). 
In general, it is only possible to construct the sought pdf Q(h, I, c; 7) by 
numerical simulations generating a huge number of realizations of the un- 
derlying stochastic process B(t, 7). For certain stochastic process X(t) (TjQ), 
the pdf Q(h, I, c; 7) can be calculated analytically. In this Appendix, we 
obtain the explicit analytical expression for Q(h, I, c; 7) in the case of the 
Wiener process, B{t,^) = v(r, 7) given by expression (TjJ. 
As shown below, the sought pdf Q(h, I, c; 7) will be derived from the solution 
of the diffusion equation 

/a -1 \ df(c;r,j) 0/(c;r,7) 1 <9 2 /(c; t, 7) 
(A ' 1} ^^ +7 ^^ = 2 do- ' 

where the reduced time r and parameter 7 are defined in The well- 
known solution of the diffusion equation (lA.lj) . satisfying the initial condi- 
tion 

(A.2) /(c;r = 0, 7 ) = 5(c) , 
is 

(A.3) /(c;r,7) = g{c-^r,r) , g{x, r) = ~ 7 L= exp ( ) . 

V 2tvt \ It 
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A.l. Distribution of the maximal value 

The full derivation of the pdf Q(h,l,c;j) for the Wiener process v(r, 7) 
(j3J) involves rather extensive calculations. In order to present the intuition 
behind these calculations, it is useful to consider the reduced problem of 
determining the joint pdf of the high (maximum) and close values of the 
Wiener process u(t';7) within a given time interval r' G (0, r). This re- 
duced problem is tightly connected with the so-called "absorption" of the 
process v(t; 7) at the given level h. The existence of absorption amounts to 
supplement the diffusion equation ( 1A.1I) by the absorption condition 

(A.4) /( c = / l ;r, 7 ) = 0, h>0. 



We denote the solution of the initial-boundary value problem flA.lj) . (IA.2I) 



(IA.4I) by f(c,h;r,j). This function is the pdf of the values, at time r, of 
the realizations of the stochastic process v(t'; 7), that has not reached the 
level h for all times t' G (0, r), i.e., 

(A. 5) f(c, h;r, j)dx = Pr{t>(r;7) G (x,x+dx)nH < h} , x<h,h>0, 
where 

= sup d(t', 7) . 

r'e(0,r) 

Correspondingly, expression (1A.5j) implies that the joint pdf of the random 
variables C = v(t, 7) and maximum is equal to 

(A.6) Q(h,c- n ,T)= df{c ^ Tn) , h>0, c<h. 

Then, the joint pdf of the high and close values of the stochastic process 
v (V, 7) within the interval r' G (0, 1) is obtained by taking r = 1 in expres- 
sion (lA.6p . which reads 

(A.7) Q(h,c; 1 ) = df{c ' h ^ h =1 > 7) , h>0, c<h. 
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The joint pdf of the high and close values of Brownian motions was derived 
by Paul Levy (1948). 

The solution of the initial-boundary value problem (lA.lj) . (1A.2j) . (1A.4|) can be 



obtained by the reflection method as follows. The reflection method consists 

in replacing the initial-boundary value problem by the following auxiliary 

initial-value problem 

df(c,h;r,j) | df{c,h;T,-f) _ l d 2 f{c,h;r,j) 
g-j dr dc 2 dc 2 

/(c, h; r = 0, 7) = 5(c) - A5(c - 2h) , 
where the constant A has to be chosen such that the solution of the initial- 
value problem flA.8j) satisfies the absorbtion boundary condition (IA.4I) . 



The solution of the initial value problem flA.8j) is nothing but 

/(c, h; r, 7) = g(c - -yr, r) - Ag(c -2h- 7T, r) , 



where g(x, r) is given in flA.31) . Substituting this expression into the bound- 



ary condition flA.4j) yields A = e 2hl ' . Thus, the solution of the initial- 
boundary value problem is 

(A.9) /(c, h- r, 7) = g(c - jr, r) - e 2h ^g(c - 2h - 7 r, r) . 

Substituting it into expression ( 1A.7I) yields the joint pdf of the high and 
close variables, 

(A.10) Q(h,c,-y) = f(c;j)Tl(h\c), c<h, h>0, 
where 

(A.ll) / ( c;7 ) = _= exp ' 'M 



/2tt V 2 J 
is the pdf of the close value c = v(l, 7), while 

H{h\c) = 2{2h - c)e 2h{c - h) , h ^ max{0, c}, 

is the pdf of the high value H, under the condition that the close value is 
equal to c. 



ectaart.cls ver. 2006/04/11 file: 0HLC_EC0N8.tex date: August 12, 2009 



42 A. SAICHEV, D. SORNETTE, V. FILIMONOV 

A. 2. Wiener process between two absorbing boundaries 

The joint pdf Q(h, I, c; 7) denned by (!3T!) of the high, low and close values 
of the Wiener process can be expressed similarly to relation (1A.7I) via the 
solution of the diffusion equation ( 1A.1I) in the presence of two absorbing 
boundaries. We thus the new initial-boundary problem 

df(c,h,l;r,j) | df{c,h,l]T,~f) _ 1 d 2 f{c, h, I; r, 7) 
dr dc 2 dc 2 

( A - 12 ) f(c,h,l;T = 0, 1 )=6(c) , 

/(c = /i + wr, /i, /; r, 7) = , /(c = / + vt, h, I; r, 7) = . 

Using the reflection method and a derivation similar to that leading to 
expression flA.9j) . we obtain 



f(c, h, I: t, 7) = (1 ' x 



2(v-u)(m(h-l)+l) 

m=—oo 



( A - 13 ) [e 2 ^ ){h ~ l)m g{c - 7 r + 2(/i - /)m, r)- 

e 2( 7 - t) )(( fo -0m+0^( c _ 7T _ 21 - 2(/i - /)m, r) 

Figure Al plots the function 
(A.14) f{c, h, I', t, 7) + 0.05 • t 

as a function of the close value c. The 0.05 ■ r is added in order to show 
clearly that f(c, h, I; r, 7) indeed satisfies the moving absorption conditions 
flA~T2l . 

We need the particular case corresponding to static boundaries (u — v — 0) 
to transform the general solution flA.131) into 

00 

/(c,M;t j7 )= E [e 2 ^-^(c-7r + 2(/ i -/)m,r)- 

m=— 00 

(A ' 15) e^-^+'^c - 7 r - 2/ - 2{h - l)m, r)] , 

I < c< h , h > , /<0. 
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Fig. Al: Plots of the function (|AT4|) as a function of the close value c 
for h = 1, I = -1, u = 0.5, v = -0.25, 7 = 0.8 and for r = 1 + 0.25 • k, 
where k = 0,1, ... ,9. 



A. 3. Distribution of high, low, close values 

The joint pdf Q{h,l,c,^f) corresponding to the diffusion process v{r',^) 
within the time interval r' 6 (0,1) is obtained via the pdf /(c, h, I; r, 7) 
given by flA.151) by the following relation, which is analogous to flA.71) : 

niu 1 \ d/(c, h,l-r = 1,7) 
Q(M ' C;7) = dhdl • 

Analogously to expression (lA.lOp . we obtain 

(A.16) Q(h,l,c;j) = f (c; j)K(h, l\c) , h > , l<0, l<c<h, 

where f(c; 7) is given by ( 1A.11I) . while 7Z(h, l\c) is the joint pdf of the high 
and low values under the condition that the close value is equal to c: 

TZ(h,l\c) = 

00 

(A.17) 4 m\mV(m{h-l),c) + (l-m)V(m(h-l)+l,c) 

V(h,c) = [(c - 2hf - l] e 2h( - c -V . 



m=—oo 
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Figure A2 shows the contour lines of the conditional pdf TZ(h, l\c) for c = 
in the plane (h,l). Skorohod (1964) reported the joint distribution of the 
high- low-close for random walks with zero drift (7 = 0). 




Fig. A2: Contour lines of the conditional pdf 1Z(h, l\c) given by (|A.17p 
for c = in the plane (h, I). 



A. 4. Function g n defined in expression (J3: 

As seen from expressions (|35|) and (1401) . the diagrams (see definition 12. 7\i 
of the most efficient estimators are expressed via the function g n (9, 0; 7) 
defined by the equation (1341) . The above calculations valid for the Wiener 
process show that it is equal to 

g n (9, <P; 7) = 4=e^ 2 / 2 x 

V Z7T 



^2 m ml n (m(h — I), c; 7) + (1 — m)I n (m(h — c; 7) 

n 

where 



m=— 00 



4(^,c,7) 

and 



^ p 2+ " exp ( 7 cp - -p 2 J P(/ip, cp)rfp 



h = cos 9 cos , / = cos 9 sin 6 , c = sin 9 . 
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In particular, 



40, c, 7 = 0) 



F(n) 



FH = 2^(2 + n)r(^) 



2h - c\ 3 + n ' 
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